Symbolic Dynamic Programming for Continuous State and Action MDPs
نویسندگان
چکیده
Qa := ∫ Qa ⊗ P (xj|b,b ′,x, a,y) dxj [Symbolic Substitution] For all bi in Qa Qa := [ Qa ⊗ P (bi|b,x, a,y) ] |b′i=1 ⊕ [ Qa ⊗ P (bi|b,x, a,y) ] |b′i=0 [Case ⊕] Compute final Q-Value (discount and add reward): Qa := R(b,x, a,y)⊕ (γ ⊗Qa) Note that ∫ f (xj)⊗δ[xj−h(z)]dxj = f (xj){xj/h(z)}where the latter operation indicates that any occurrence of xj in f (x ′ j) is symbolically substituted with the case statement h(z) (Sanner, Delgado, de Barros, UAI 2011).
منابع مشابه
Symbolic Dynamic Programming for Discrete and Continuous State MDPs
Many real-world decision-theoretic planning problems can be naturally modeled with discrete and continuous state Markov decision processes (DC-MDPs). While previous work has addressed automated decision-theoretic planning for DCMDPs, optimal solutions have only been defined so far for limited settings, e.g., DC-MDPs having hyper-rectangular piecewise linear value functions. In this work, we ext...
متن کاملSymbolic Dynamic Programming for Continuous State and Observation POMDPs
Point-based value iteration (PBVI) methods have proven extremely effective for finding (approximately) optimal dynamic programming solutions to partiallyobservable Markov decision processes (POMDPs) when a set of initial belief states is known. However, no PBVI work has provided exact point-based backups for both continuous state and observation spaces, which we tackle in this paper. Our key in...
متن کاملSymbolic Dynamic Programming
Decision-theoretic planning aims at constructing a policy for acting in an uncertain environment that maximizes an agent’s expected utility along a sequence of steps that solve a goal. For this task, Markov decision processes (MDPs) have become the standard model. However, classical dynamic programming algorithms for solving MDPs require explicit state and action enumeration, which is often imp...
متن کاملStochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry
We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...
متن کاملBounded Approximate Symbolic Dynamic Programming for Hybrid MDPs
Recent advances in symbolic dynamic programming (SDP) combined with the extended algebraic decision diagram (XADD) data structure have provided exact solutions for mixed discrete and continuous (hybrid) MDPs with piecewise linear dynamics and continuous actions. Since XADD-based exact solutions may grow intractably large for many problems, we propose a bounded error compression technique for XA...
متن کامل